An Integrated Tool for Annotating Historical Corpora

نویسندگان

  • Pablo Picasso Feliciano de Faria
  • Fábio Natanael Kepler
  • Maria Clara Paixão de Sousa
چکیده

E-Dictor is a tool for encoding, applying levels of editions, and assigning part-ofspeech tags to ancient texts. In short, it works as a WYSIWYG interface to encode text in XML format. It comes from the experience during the building of the Tycho Brahe Parsed Corpus of Historical Portuguese and from consortium activities with other research groups. Preliminary results show a decrease of at least 50% on the overall time taken on the editing process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

eBonsai: An Integrated Environment for Annotating Treebanks

Syntactically annotated corpora (treebanks) play an important role in recent statistical natural language processing. However, building a large treebank is labor intensive and time consuming work. To remedy this problem, there have been many attempts to develop software tools for annotating treebanks. This paper presents an integrated environment for annotating a treebank, called eBonsai. eBons...

متن کامل

GerManC - Towards a Methodology for Constructing and Annotating Historical Corpora

for 'Digital Historical Corpora Architecture, Annotation, and Retrieval' Conference, 03-08 December 2006, Dagstuhl (D) GerManCTowards a Methodology for Constructing and Annotating Historical Corpora Astrid Ensslin, Martin Durrell, Paul Bennett University of Manchester (UK) Our paper focuses on the one hand on the challenges posed by the structural variability, flexibility and ambiguity found in...

متن کامل

Arabic anaphora resolution: corpora annotation with coreferential links

Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...

متن کامل

TESLA: A Tool for Annotating Geospatial Language Corpora

In this paper, we present The gEoSpatial Language Annotator (TESLA)—a tool which supports human annotation of geospatial language corpora. TESLA interfaces with a GIS database for annotating grounded geospatial entities and uses Google Earth for visualization of both entity search results and evolving object and speaker position from GPS tracks. We also discuss a current annotation effort using...

متن کامل

BECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management

Corpus annotation is an important aspect in speech applications where stochastic models need to be trained and evaluated. Multimodal corpora are also annotated. Moreover, corpus annotation is an essential phase in the construction of emotion recognizer engines. Large corpora, as they are essential to construct representative knowledge bases, have been a problem for corpus annotators. Time consu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010